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IN THE CLAIMS: 

Please find below a listing of all pending claims. The statuses of the claims 

are set forth in parentheses. For those currently amended claims, underlined 
emphasis indicates Insertions and str i k e through emphasis (and/or double brackets) 
indicates deletions. 

1. (cancelled) 

2. (previously presented) The method according to claim 6, wherein 

the LU decomposition Is executed in parallel by each processor of each node in a 

recursive procedure. 

3. (currently amended)The method according to claim 6, wherein 

in said update step, while computing part of a block of the matrix which is not yet 
LU-decom posed a row b l ock , each node transfers data that belongs to a computed 
block and is needed to update other blocks, to other nodes in parallel to the 
computation. 

4. (previously presented)The method according to claim 6, wherein 

said parallel computer is a SMP node distributed-memory type parallel computer In 
which each node is a SMP (symmetric multi-processor). 

5. (currently amended) A matrix processing device of a parallel computer in 
which D plurality of processors and a plurality of nodes^_each_including memory a 
memory and a processor are connected through a network, wherein the plurality of 
nodes includes n nodes, comprising: 

a first allocation unit dividing to divide an array A(l:k,l:k) of a matrix to be 
processed by the number n of nodes to create n divided matrices and assigning to 
assign divided matrices subarrays A(l:k/n,l:k), ... , A(k(n-l)/n:k, l:k) to the divided 
matrices , dividing to divide one of the subarrays into a_narrow block by an integer 
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m, and wherein each node's reading node is to read data from memory so that a 
first narrow blocl< is placed in a first node of the plurality of nodes , a second narrow 
block Is placed in a second node of the plurality of nodes , ... , a m-th narrow block is 
placed in a (mod(m-l,n)+l)-th node of the plurality of nodes, wherein mod(m-l,n) 
is modulo of m-1 divided by n ; 

a separation unit e l iminating to eliminate data corresponding to a_diagonal b l ocks 
block A(nbase:nbase+m, nbase:nbase+m), where nbase being a position of an 
upperleft element of the diagonal block and m afe being intcrgcrs integers , from 
data of the narrow blocks placed in each node, at each node; 
a second allocation unit to redundantly o l locoting allocate same data as the diagonal 
block which is eliminated at each node to each node commonly; 

an LU decomposition unit app l ying to apply LU decomposition to both the diagonal 
block and the an allocated block bv the first allocation unit in parallel in each node; 
and 

an update unit updating to update the block of the matrix which is not yet LU- 
decomposed, using an LU-decomposed block, at each node[[.]L 

thereby realizing fast LU decomposition of a matrix effectively using a hardware of 
the parallel computer. 

6. (currently amended)A matrix processing method of a parallel computer in which a 
p l ura l ity of processors and a plurality of n_nodes each_including memory a memory 
and a processor are connected through a network, comprising: 

dividing an array A(l:k,l:k) of a matrix to be processed by the number n of nodes 
and assigning divided matrices subarrays A(l:k/n,l:k), ... , A(k(n-l)/n:k, l:k)jto 
divided matrices , dividing one of the subarrays into narrow block by integer m, and 
each node's reading data from memory of each node so that a first narrow block is 
placed in a first node, a second narrow block is placed in a second node, ... , a m-th 
narrow block is placed in a (mod(m-l,n)+l)-th node , wherein mod(m-l,n) is modulo 
of m-1 divided by n ; 
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eliminating data corresponding to a_diagonal b l ocks block A(nbase:nbase+m, 
nbase:nbase+in), where nbase being a position of an upperleft element of the 
diagonal block and m are Intcrgcrs integers , from data of the narrow blocks placed 
In each node, at each node; 

redundantly allocating same data as the diagonal block which is eliminated at each 
node to each node commonly; 

applying LU decomposition to both the diagonal block and the an allocated block by 
the assionina step in parallel in each node; and 

updating the blocks of the matrix which is not yet LU-decom posed, using an LU- 
decomposed block, at each node[[.]L 

thereby realizing fast LU decomposition of a matrix effectively using a hardware of 
the parallel computer. 

7. (currently amended)A computer-readable storage medium on which is recorded a 
program for enabling a computer to realize a matrix processing method as a parallel 
computer in which □ p l ura l ity of processors and a plurality of nodes each_including 
memory a memory and a processor are connected through a network, the method 

comprising: 

dividing an array A(l:k,l:k) of a matrix to be processed by the number n of nodes to 
create divided matrices and assigning divid e d matric e s subarrays A(l:k/n,l:k), ... , 
A(k(n-l)/n:k, l:k) to divided matrices, dividing one of the subarrays into narrow 
block by integer m, and each node's reading data from memory of each node so that 
a first narrow block is placed in a first node, a second narrow block is placed in a 
second node, ... , a m-th narrow block is placed in a (mod(m-l,n)+l)-th node^ 
wherein mod(nn-l,n) is modulo of m-l divided by n ; 

eliminating data corresponding to a_diagonal b l ocks block A(nbase:nbase+m, 
nbase :nbase+m), where nbase being a position of an upperleft element of the 
diagonal block and m are intcrgcrs integers , from data of the narrow blocks placed 
in each node, at each node; 
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redundantly allocating same data as the diagonal block which is eliminated at each 
node to each node commonly; 

applying LU decomposition to both the diagonal block and the an allocated block by 
the assigning step in parallel in each node; and 

updating the blocks of the matrix which is not yet LU-decom posed, using the LU- 
decomposed block, at each node[[.]L 

thereby realizing fast LU decomposition of a matrix effectively using a hardware of 
the parallel computer. 
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